Method for isolating a fault from error messages

ABSTRACT

A method, a use of the method, a system, a use of the system, a computer program code element, and a computer readable medium for automatically isolating primary faults out from a system log including actual error messages in a system controlled by an object oriented program. Messages are isolated through clustering.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Swedish patent application 9904008-1filed Nov. 3, 1999 and is the national phase under 35 U.S.C. § 371 ofPCT/SE00/02166.

TECHNICAL FIELD

The present invention pertains to a method, a use of the method, asystem, and a use of the system, a computer program code element, and acomputer readable medium for isolating a fault from a plurality of errormessages in a system log of a process controlled by an object orientedprogram.

PRIOR ART

Developing control systems for complex systems is a difficult andincreasingly important task. Larger control systems have traditionallybeen developed using structured analysis and functional decomposition,see for example T. DeMarco. Structured Analysis and SystemSpecification. Prentice-Hall. 1979. When applying traditionalprogramming, it is possible to optimize a programming code with respectto real-time requirements, and it is easier to generate concise errormessages since the state of a whole control program is known at eachpoint of time.

Today many large systems are designed using an object oriented approach.This has several advantages over traditional approaches, includingbetter possibility to cope with complexity and to achieve goodmaintainability and reusability. However, since error messagesoriginating from a certain fault often reflect the control system designand architecture, it can be very difficult for an operator to understandwhich error message, of a plurality of error messages, is most relevantand closest to the real fault.

One method to use for fault isolation is to compile a database, or usean expert system, see e.g. W. T. Scherer and C. C. White. A survey ofexpert systems for equipment maintenance and diagnostics. In S. G.Tzafestas, editor, Knowledge-based system diagnosis, supervision andcontrol, pages 285–300. Plenum, New York, 1989, and S. Tzafestas and K.Watanabe. Modern approaches to system/sensor fault detection anddiagnosis. Journal A, 31(4):42–57, December 1990. But a highlyconfigurable system has the disadvantage that every installation of thecontrol system needs a new database. Also, when changes are made in thesystem itself, it can render an extensive database useless.

Another possibility for fault isolation is to use a model that istailored for fault isolation. Some general examples can be found in W.Hamscher, L. Console, and J. de Kleer. editors. Readings in model-baseddiagnosis. Morgan Kaufmann Publishers, San Mateo, Calif., 1992, and M.Sampath, R. Sengupta, S. Lafortune. K. Sinnamohideen, and D. Teneketzis.Diagnosability of discrete-event systems. IEEE Transactions on AutomaticControl. 40(9):1555–1575, September 1995.

The advantage of using a specialized model is that such a model cancontain exactly the information needed for the fault isolation, and thefault isolation procedure can be much simplified. The disadvantage isthat this model must be maintained separately. To maintain such a modelis not trivial and a great deal of work may be needed to keep the modelup to date.

Exception handling mechanisms are intended to help improve errorhandling in software, to make programs more reliable and robust. Theyare language constructs that facilitate error handling outside of theconventional program flow and at the appropriate level. The exceptionconstructs might also support a programmer in providing more informationto the error handler code than available through the conventional objectinterface, to facilitate error recovery.

Exception handling mechanisms are in their nature low level constructs,and as such address a fault handling problem from the bottom up. It isknown from to note, as pointed out in R. Miller and A. Tripathi “Issueswith exception handling in object-oriented systems”, in Proceedings of11th European Conference on Object-Oriented Programming (ECOOP97),Jyväskylä, Finland, June 1997, that the goals of exception handlingoften stand in direct conflict with the goals of an object orientedarchitecture, the very same goals of encapsulation and modularity thatcause the fault propagation problem addressed in the present invention.

SUMMARY OF THE DESCRIBED INVENTION

The present invention provides a model based method for fault isolationby introducing an extra layer between an operator and/or process controlmeans and actual error messages produced by a control system. Thisapproach can be summarized as having a liberal error message sendingpolicy for individual objects, with a structured signature for themessages, and a clustering method to isolate fault scenarios. A model ofthe system is then used to isolate the object/objects closest to thefault in each fault scenario. The message(s) closest to the fault andhence most relevant, can then be presented to, for example, an operator,that can take measures to deal with a fault more easily than previouslyknown in accordance with the present invention.

In order to overcome disadvantages and problems related to faultisolation today, for example, for processes involving an industrialrobot, the present invention sets forth a method provided for isolatinga fault from a plurality of error messages in a system log of a processcontrolled by an object oriented program.

In an embodiment of the present invention it includes the steps of;

creating a predefined signature for each error message of the pluralityof error messages,

forming a time period to frame from the plurality of error messages theerror messages belonging to a fault scenario of the fault,

clustering the error messages that fall within the time period,

forming from the signatures of the clustered error messages a basegraph, which is a formal representation of the fault scenario, carryingcause-effect relation information of the error messages,

analyzing the base graph rules to isolate the fault, and

reporting information containing the fault to a control means, whichprovides measures taking care of the fault.

Each of the signature is brought to contain information on complaint andcomplainer in an embodiment of the invention.

An explanation model is formed from an underlying assumption ofcommunication between system entities needed to carry out a specifictask, which system entities includes objects, packages and threads, andhaving the signatures defined on the base of the explanation model in anembodiment of the invention.

The mentioned time period starts with the first error message in a faultscenario and ends with a stop message in an embodiment of the invention.

Reported information in accordance with an embodiment of the presentinvention is containing a reliability estimate of the fault.

In one embodiment of the present invention, in case of an inconclusivebase graph, the analyzing step further includes the step of;

extending the base graph with the help of a system model, containinginformation on the dependency relation between system entities, and

analyzing the extended base graph with a set of predeterminedexplanation rules to isolate the fault.

Another embodiment of the present invention includes that the systemmodel is formed using the “Unified Modeling Language (UML)”.

Still another embodiment of the present invention includes the use of amethod according to the above in a process involving an industrialrobot.

Further, the present invention provides a system for isolating a faultfrom a plurality of error messages in a system log of a processcontrolled by an object oriented program.

Each error message of the plurality of error messages includes in apreferred embodiment of the present invention a predefined signature andan interface located between the program and a process control, theinterface includes elements for clustering error messages belonging to afault scenario of the fault, elements for forming from the signatures ofthe clustered error messages a formal representation, a base graph, ofthe fault scenario, elements for analyzing the formal representation toisolate the fault, and elements for reporting information containing thefault to the process control, which provides measures taking care of thefault.

Another embodiment of the present invention includes that the signatureincludes information on complaint, complainer and complainee, and thatthe signature is formed from an explanation model including anunderlying assumption of communication between system entities needed tocarry out a specific task, the system entities include objects, packagesand threads.

The time period starts with the first error message in the faultscenario and ends with a stop message in one embodiment of theinvention.

A still further embodiment provides that the system includes elementsfor establishing a reliability estimate of the fault and that the faultinformation contains the estimate.

In case of the base graph being inconclusive the system furtherincludes, in one embodiment of the invention, elements for extending thebase graph with the help of a system model, containing information onthe dependency relation between system entities, and elements forisolating the fault from the extended graph by using predeterminedexplanation rules.

The “Unified Modeling Language (UML)” is used to form the system modelin a preferred embodiment of the invention.

Yet another embodiment provides the use of the system in accordance withthe above described system in a process involving an industrial robot.

A further embodiment of the present invention is set forth by a computerprogram code element including computer code for enabling a processor toisolate a fault from a plurality of error messages in a system log of aprocess controlled by an object oriented program.

It enables a processor to carry out the steps of;

forming an interface between the program producing the plurality oferror messages and a process control,

creating a cluster from said plurality of error messages belonging to afault scenario of the fault,

analyzing the cluster using predefined error message signatures and apredetermined system model to isolate the fault, and

reporting information containing the isolated fault to the processcontrol, which provides measures taking care of the fault.

One embodiment provides the code element contained in a computerreadable medium.

Another embodiment of the invention provides that the code element issupplied at least in part over a network such as the Internet.

Also set forth through the present invention is a computer readablemedium. Whereby it contains a computer program code element includingcomputer code for enabling a processor to isolate a fault from aplurality of error messages in a system log of a process controlled byan object oriented program, by which the processor is enabled to carryout the steps of:

forming an interface between the program producing the plurality oferror messages and a process controlling means,

creating a cluster from the plurality of error messages belonging to afault scenario of the fault,

analyzing the cluster using predefined error message signatures and apredetermined system model to isolate the fault, and

reporting information containing the isolated fault to the processcontrol, which provides measures taking care of the fault.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and forfurther objectives and advantages thereof, reference may now be made tothe following description taken in conjunction with the accompanyingdrawings, in which:

FIG. 1 schematically illustrates a conventional prior art message systemwith a system log;

FIG. 2 schematically illustrates a message system for fault isolation inaccordance with the present invention;

FIG. 3 schematically illustrates graphical notations for different errormessages in accordance with an embodiment of the invention;

FIG. 4 schematically illustrates a flow chart over an analysis procedurein accordance with the present invention;

FIG. 5 schematically illustrates four possible properties of a basegraph node in accordance with the present invention; and

FIG. 6 schematically illustrates a generic example of a UML traversalalgorithm.

TABLE

Table 1 on the last page of the present description illustrates serverchains returned by an UML traversal algorithm for the model inaccordance with FIG. 6.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The features of the present invention provide a more abstract viewregarding fault isolation then prior art, and addresses the problem,mainly fault propagation, from a top-down perspective. It is notintended to replace low level error handling, but to be used inconjunction with low level error handling in some form. It can, e.g., bea disciplined use of return codes or full-fledged exception handlingmechanisms.

In object oriented design, encapsulation and modularity are fundamentaland important design goals for reuse, maintenance and complexityreasons. An often conflicting design goal lies in the need to suppressunnecessary, propagating, error messages and eventually give an operatorand/or process control means a concise picture of a fault situation, inorder to act upon it.

The issue is how a configurable and safety critical control system withan object-oriented architecture handles and isolates run-time faults andalarms, and specifically the problems that arise due to the objectoriented structure and complexity of the control system itself.

The control system can be multi-threaded, with several concurrent taskscommunicating both asynchronously and synchronously. The objects in thesystem according to the present invention are both pure software objectsas well as objects corresponding to hardware. The invention isconcentrated mainly on fault handling for a fully operational system,and will not discuss installation and startup problems.

With the term fault, a run-time change or event is meant here, normallyin hardware, that eventually causes the system to abort conventionaloperation. When a fault occurs during conventional operation, the systemoften generates a large number of error messages, for example, bursts oferrors. Error messages are sent by individual objects to notify anoperator or a means for operating when the object has detected an errorcondition. The individual object does not in general know how close itis to the real fault, or if sufficient reporting is already takingplace, and hence whether it should report to the operator or not.

FIG. 1 schematically illustrates a conventional prior art message systemwith a system log where box or block 100 depicts input signals fromsensors, motors I/O equipment etc. Reference numeral 110 represent anobject oriented control system controlling an arrangement such as anindustrial robot and checking for messages in a system log 120 dependingon its input signals. If there are messages corresponding to theprocessing of signals by the control system 110, messages regardingfaults or any other message that has to be presented to an operator ofthe system/arrangement are output as information for maintenance etc.Prior art control systems and system logs are not designed to pinpointprimary faults or to sort them out from among a plurality of messages inorder to determine maintenance, repair etc. for an operator of, forexample, a complex control system, this is restricted to highly skilledengineers or the like.

When an error condition is encountered, the affected object normallylets the proper interface method return with an error code to thecalling object. It might also decide to continue its conventionaloperation, e.g., in the case of an event driven threads main loop. Thereturned error code in turn can be regarded as an error condition by thecalling object. If deemed appropriate by designers, the object registersan error message to a central log 120. If an error condition is deemedso serious by a detecting object that conventional operation cannotcontinue, a special asynchronous call is made that performs an emergencystop.

For objects in a program close to each other it is possible to suppresserror messages by information passing, but this is not alwaysfeasible—it is an explicit aim of object oriented modeling toencapsulate knowledge about the internal state of objects and to achieveindependence between groups of collaborating objects (i.e.,encapsulation and modularity). Moreover, the control system that isconsidered here is safety critical. In case of a serious fault, thefirst priority is to bring the system or arrangement to a safe state.Only then is it possible to start analyzing what may have caused thefault.

A primary concern is a situation where an operational system is normallyrunning without direct supervision. Since the error messages stemmingfrom a certain fault often reflects the control system design andarchitecture, it can be very difficult for the operator to understandwhich error message that is most relevant and closest 120 to the realfault.

The present invention, as depicted in FIG. 2, schematically illustratesa message system for fault isolation, and sets forth a model basedmethod for fault isolation by introducing an extra layer 210, 220, 230between the operator or means for operating and the actual errormessages 200 produced by the control system 110. The approach can besummarized as having a liberal error message sending policy forindividual objects, with a structured signature for the messages, and aclustering 210 method to isolate (fault) scenarios. A model 220 of thesystem is then used to isolate the object/objects closest to the faultin each fault scenario (cluster 1 and cluster 2 in FIG. 2). Themessage(s) closest to the fault (Primary fault message 1 and Primaryfault message 2 in FIG. 2) and hence most relevant, can then bepresented to an operator or means for operating.

A basic terminology is now established:

Definition: Control System Log

The system log 200 constitutes a list of events recorded by the systemin chronological order in the form of messages.

Eventually the terms event and message are used interchangeably.

Definition: Fault

A fault is a run-time change or event, often in hardware 100 thateventually causes the system to deviate from normal operation.

Definition: Error Message

An error message is a message sent by an individual software object tothe operator, as response to a detection of an error condition for thatobject.

The kind of error condition that can arise in an object is closelyrelated to the refinements of the error message, which is furtherdiscussed below

Definition: Fault Scenario

A fault scenario consists of all instantiated system entities and eventsinvolved in the origin and manifestation of a specific fault.

Examples of system entities are objects, links, threads, hardwareequipment and physical connections.

The system log may contain messages belonging to several different faultscenarios. In order to perform analysis of a single fault scenario,messages that have occurred during the same fault scenario are clustered200, 210 together according to the present invention. A time period isformed to frame from a plurality of error messages the error messagesbelonging to a fault scenario of the fault.

A clustering 210 method for clustering the error messages that fallwithin the time period according to the present invention is based onmaximal time span of a cluster here called cluster period framed dividerevents. A divider event indicates that a new cluster begins with thisevent even if the time period is valid. Each cluster is identified andpassed on for analysis 230. An example of a basic clustering algorithmis found below.

The Clustering Algorithm Pseudo Code:

Name: ClusterLog Input: A system log Output: A stream of clusters WHILEnew event exist OR time out/end of file DO  IF event exist THEN  IFevent in divider group THEN   Analyze current cluster.   Create newcluster and add current event.  ELSE   IF cluster exists THEN   IFcurrent event (time) − start event (time)   <Cluster period   THEN   Add current event to cluster   ELSE    Analyze current cluster.   Create new cluster and add current event.   ELSE   Create new clusterand add current event.  ELSE  Analyze current cluster.

Design parameters for clustering are events in a divider group and thelength of a cluster period. An adaptive scheme could be considered with,e.g., increasingly larger (or smaller) cluster periods or concatenationof clusters.

To use a specific explanation model 220 on a fault scenario, it isnecessary to know how the error messages in the log relate to the chosenexplanation base (s). The general semantics of this relationship,induced by the demands of an OO architecture (Object Oriented, OO), isthat an error message is a complaint from an (instantiated) entity aboutan (instantiated) entity. The complainee entity might be the same as thecomplaining entity (internal error), or the complainee might be unknownfor OO architecture reasons. The two entities need in general not be ofthe same kind, e.g., an object might complain on a thread. An errormessage signature is complainer and complainee information for an errormessage. An error message signature is present in error messages,including information of a sending source object and thread, and ifpossible, the reason for the error message in form of information onwhich system entity that failed to perform a requested service.

An error message signature is complainer and complainee information foran error message.

An error message signature is present in error messages, comprisinginformation of a sending source object and thread, and if possible, thereason for the error message in form of information on which systementity that failed to perform a requested service.

An error message signature for objects as explanation base is detailedbelow.

An ordinary method call to another object that returns with an errorcode is the most common example of an error condition that should resultin a relational error message with known complainee. The complainee maybe unknown when the responsible object is not known to the complainerfor encapsulation or modularity reasons.

A corresponding error condition could be a detected error in hardware orexternal equipment which the object encapsulates knowledge of.

A graphical notation for the different error message types is introducedand depicted in FIG. 3. Boxes or blocks 30, 32, 34, 36 representobjects. A relational error message with known complainee is denotedwith an arrow 38 between the complainer 30 and the complainee 32. Aspecific error message uses an arrow to self annotated block 34 with theletter “s”. A relational error message with unknown complainee also usesan arrow to self annotated block 36 with the letter “u”. It should benoted that the two types of arrows to self are semantically different.The relational error message with unknown complainee could for examplebe more suggestively drawn as an arrow ending in a question mark. Thenotation is chosen for simplicity reasons.

The information obtainable from error messages adhering to OO designrules are not always enough to form a complete and coherent overviewpicture of a fault scenario. To complement the error messages, a systemmodel 220 is used, that contains information on how system entities,such as software entities, depend on each other to perform its tasks.The model 220 reflects the current system design.

An explanation model is the underlying assumptions needed when using aspecific type of system entity as the base for fault isolation. Systementity types that can be used as base for explanation models are, e.g.,objects, packages and threads (tasks). A main idea is to use thestructure of the system, manifested in entities as mentioned, toestablish cause-consequence relationships between error messages.Examples of model structures are UML class diagrams, object diagrams andtask diagrams, Object Management Group. UML Notation Guide, version 1.1.doc no ad/97-08-05, September 1997.

A system entity often has a certain type, from which several individualscan be instantiated, as for class—object and thread type—threadinstance. Error messages of course come from instantiated entities. Thesystem model though, does not need to contain information on instances,but only on types, as for example UML class diagrams. Using the model to“guess” probable dependencies between instances would make such ananalysis less reliable. This fact can be used to estimate thereliability of the analysis. A main goal is a cause-consequencerelationship between all error messages in a fault scenario, preferablywith one unique root cause error message. This is a first measure ofreliability.

A design parameter for the analysis is the search depth to use in themodel, i.e., how many “silent” entities are to be assumed part of thefault scenario. Another design parameter is priority based addition ofcause-consequence relationships using the system model. The priority canbe based on properties of the instantiated entities or propertiesassigned to entities in the model.

For the sake of simplicity in the following presentation, an algorithmis detailed below for the specific explanation model that uses onlyobjects as base. The explanation model is, in short, that the problemsan object experiences, reported for example by error messages, can beexplained by the objects dependence on other objects and the knowledgeand resources they administer. This is a suitable assumption when theinterest is mainly in hardware induced faults. A more completeexplanation model would, e.g., use both objects and threads asexplanation base, and hence also take dependencies between concurrentthreads in account.

An analysis can be broken down into three main steps as schematicallyillustrated in FIG. 4 through a flow chart over an analysis procedure inaccordance with the present invention.

The forming of a rudimentary partial order directly from the clustered400 error messages can be accomplished given the message signature. Itcan be represented as a directed graph, which henceforth will be calledthe error message graph, base message graph 410 or base graph.

An error message graph can possibly be modified by a graph extension420. All added edges will be relational, since a goal with graphextensions 420 is to make the graph connected. The edges betweenobjects, both from the log and derived, should be read as “complaintabout”. A formed base graph consists of system entities describing howthey depend on each other in a current fault scenario. The base graph isconstructed using information in an error message signature.

The purpose of graph extensions 420 is to make the base graph connectedand acyclic, with a unique root node. A base graph is used to isolate aprobable primary fault, using the semantics provided by a suitableexplanation model 220.

Objects in the graph are classified in terms of their error messages, orrather the edges they are connected to. Since there are three kinds oferror messages (or edges), two involving one object and one involvingtwo objects, there are four independent ways an object can be connectedto an edge. The objects in a base graph will be classified according tothis, as described below. It also provides a chosen notation for theobject properties.

Node (Object) Classification and Notation.

U Is complainer, unknown complainee Cr Is complainer, known complaineeCe Is complainee S Has specific error message

Each object in a base graph can of course have any combination of theseproperties and, in analogy with the bitwise or-operator in C and C++,the combination of properties will be denoted, for example, as S. Thisexpression denotes a complainee with a specific error message, whichwould make it a good candidate for being close to the real fault. SeeFIG. 5, which schematically illustrates four possible properties of abase graph node in accordance with the present invention, for an object50 with classification U. If the number of edges of a specific typewhich an object is connected to is not taken into account, the totalnumber of possible object classifications is 15.

The strategy employed becomes straight forward, and a special case chosethe nodes in the base graph that do not have a complainee or a specificerror message, i.e., no one to “blame”, use the model to find possiblecomplainees already in the base graph, and extend the graph accordingly.

With such a strategy in mind, the 15 object classifications are dividedinto three main groups. (The classifications from the two first groupsare named well formed, in contrast to the classifications from the lastgroup.)

Three Node (Object) Classification Groups.

Needs complainee: U, U| Ce and Ce OK as it is: S, S| Ce, Cr, U| Cr, U|Ce| Cr and Ce| Cr Undesired cases: U| S, S| Cr, U| S| Ce, U| S| Cr, S|Ce| Cr (Probably bad design) or U| S| Ce| Cr

A pseudo code for a possible “ExtendGraph” algorithm and for the wholegraph becomes as below. The graph extension for an individual object hashere been located in a below subroutine called extendGraph.

The ExtendGraph Algorithm

Name: ExtendGraph Input: An error graph A UML model Output: An extended(or unchanged) error graph ExtendGraph(REF Graph) { for (all objects inGraph)  if (object type is U, U|Ce or Ce)   call extendGraph(object)  if(object type is S, S|Ce, Cr, U|Cr, U|Ce|Cr or CeICr)   do nothing  if(object type is U|S, S|Cr, U|S|Ce, U|S|Cr, S|Ce|Cr or U|S|Ce|Cr)   issuewarning }

The graph extension for an individual object has here been located in abelow soubroutine named extendGraph.

The extendGraph Pseudocode

Name: ExtendGraph Input: A base graph   A UML model Output: An extended(or unchanged) base graph ExtendGraph(REF Graph) { for (all objects inGraph)  if (object type is U, U|Ce or Ce)  call extendGraph(object)  if(object type is S, S|Ce, Cr, U|Cr, U|Ce|Cr or Ce|Cr)  do nothing  if(object type is U|S, S|Cr, U|S|Ce, U|S|Cr, S|Ce|Cr or U|S|Ce|Cr)  issuewarning }

When it is decided to extend the graph for a specific object, the UMLmodel, see 430 in FIG. 4, is used to enumerate all possible servers ofthat object. The object for which complainees are searched will be namedthe original search object. The class corresponding to the originalsearch object will be named the original search class.

If there is an object already in the base graph that corresponds to thefound server class, this information is used to “guess” that this is avalid connection in the current fault scenario, and extend the graph.Since the search relies on association information, in contrast toinstantiated links, and the found servers are of course only classes, itis not known for certain if an object corresponding to the class isactive in the current fault scenario.

The search in the UML model 430 provides a chain of classes connected bynavigable associations, where the first class is always the originalsearch class or a subclass of the original search class. Such a chain iscalled a server chain. A server chain is schematically illustrated inFIG. 6 as a generic example of a UML traversal algorithm.

The extension of the graph consists of creating nodes 60 and edges 62corresponding to the classes and associations in the server chain.Pseudo code for the algorithm is presented above. The enumeration ofservers using the UML model is performed in a breadth first fashion andis described in detail below.

Since the software model might be very large, and since it is not torely to heavily on static class information, the search depth in theserver enumeration can be limited. The search depth is the number ofassociations used, to get from the original class to the last server inthe server chain.

If the first requirement that the graph is connected and acyclic isfulfilled, then a measure of reliability 440 could be

$\frac{N_{{original}\mspace{14mu}{edges}}}{N_{{added}\mspace{14mu}{edges}} + N_{{original}\mspace{14mu}{edges}}}$which is 1 if no edges 62 needed to be added.

The extendGraph algorithm can be improved by introducing a priority onthe objects in the graph if there are several objects that could serveas a complainee for the original search object. A natural priority is tofirst consider nodes with classification from the “OK as is”-group asderived complainees. If that fails, nodes are considered withclassification from the “Needs complainee”-group. This strategy requireseither two traversals of the UML model, or a temporary storage of serverchains with their last element corresponding to “Needs complainee”nodes.

Because superclasses of the searched servers, both the original and theothers, are not returned, it has to be checked not only if the returnedservers themselves are in the graph, but also if there is a superclassof the returned server already in the graph. The reason for thisinconvenience is that it is desirable to avoid adding a superclass of aserver to keep the graph as specific as possible, but if the superclassis already present due to error messages or other graph extensions, itshould be used when extending.

A class for which servers are sought, will be called a search class, andthe class for which the search started is called the original searchclass.

A server chain is a class chain consisting of classes connected bynavigable associations in the UML model. The first class is always theoriginal search class or a subclass of the original search class. Thesearch depth of the search chain is defined as the number ofassociations used to get from the original search class to the lastserver in the chain. This notion of closeness of classes will sometimesbe referred to as the association norm, even though it of coursestrictly speaking is not a norm.

Since the algorithm returns a stream of server chains, it is defined intwo parts, GetFirstServer and GetNextServer. Pseudo code for the twoparts are presented below.

The UML Traversal Algorithm Pseudo Code

Name: GetFirstServer}, GetNextServer Input A class (the original searchclass)  Maximal search depth  An UML model Output: A stream of serverchains Returns: OK, FINISHED or ABORTED State variables (common toGetFirstServer and GetNextServer): CurrentChain   // The server chaincurrently being extended CurrentClass   // The class currently searchedfor servers Queue     // Holds the server chains that should be furtherextended CurrentAssociationList  // The UML associations of the of thecurrent  class Superclass    // Superclass of the last returned serverSubclassQueue   // Subclasses of the last returned serverGetFirstServer(className, REF ServerChain) {  clear all state variables set tmpClass = the class corresponding to className  if (no class ORseveral classes found)  error; put tmpClass in tmpChain // One singleelement enqueue tmpChain in Queue For (all subclasses of tmpClass) // Toany depth  empty tmpChain  put the subclass in tmpChain  enqueuetmpChain in Queue return GetNextServer(ServerChain) GetNextServer(REFServerChain) { // If there are any server subclasses previously found,return one of them if (SubclassQueue nonempty)  set tmpClass = dequeueSubclassQueue  for (subclasses of tmpClass)  enqueue subclass inSubclassQueue  set tmpChain = CurrentChain  add tmpClass to tmpChain set ServerChain = tmpChain  if (not to deep), enqueue ServerChain inQueue  return OK if (CurrentAssociationList empty)  if (Superclass !=NULL)  set CurrentClass = Superclass // N.B. CurrentClassChain same asbefore.  set Superclass NULL else // Get a new class (chain) from thequeue  if (Queue empty), return FINISHED  set CurrentClassChain =dequeue Queue  set CurrentClass = last element in CurrentChain if(superclass of CurrentClass exist AND  CurrentClass was not enqueued asa subclass of server)  Superclass = superclass of CurrentClass //Multiple inheritance  not allowed.  set CurrentAssociationList = allassociations of CurrentClass // We have a valid association collectionto search for servers if (CurrentAssociationList not empty) tmpAssociation = pop CurrentAssociationList  set tmpClass to theassociated class of CurrentClass  if (tmpClass is a server orCurrentClass AND  tmpClass not equal to the second to last class inCurrentClassChain)  set tmpChain = CurrentChain  add tmpClass totmpChain  set ServerChain = tmpChain  for (subclasses of tmpClass) //Only first level inheritance  enqueue subclass in SubclassQueue  if (notto deep), enqueue ServerChain in Queue  return OK returnGetNextServer(ServerChain) // CurrentAssociationList empty

FIG. 6 illustrates a traversal order of the UML model when searching forpossible complainees for an original search class named “red”.

An example is shown in FIG. 6, with a quite complicated class and allserver chains produced by the algorithm for the original search class“red” in Table 1. The search depth used is 2. The numbering of theclasses have the semantics <search depth>.<running number>.

Both super- and subclasses of the original search class are searched forservers, see classes 1.6 respectively 1.7 and 1.8 in Table 1.Superclasses need to be searched because the associations are inherited,and subclasses need to be searched since the error message or graphextension that introduced the original search class might have pointedout a superclass of the class actually active in the fault scenariounder consideration.

All subclasses of a possible server are enumerated as possible serversin their own right, see 1.3 to 1.5 in Table 1. In these server chains,only the subclass and not the original server is present, since theassociation leading to a parent is considered inherited.

Superclasses of a server are not returned, since the association leadingto the server specifically excludes the superclass. For example, theclasses “redSup” and “B1” in Table 1 are not returned. Servers of thesuperclasses are returned, though, since that association is inherited.In the server chain, it is chosen not to include the superclass itselfsince the association is considered inherited.

In a preferred embodiment of the system of the present invention it isintended for isolating a fault from a plurality of error messages in asystem log 200 of a process controlled by an object oriented program110. Each error message of the plurality of error messages includes apredesigned signature, and an interface is located between the programand a process control. Whereby the interface includes elements forclustering 210 error messages belonging to a fault scenario of thefault. Also, means for forming from the signatures of the clustered 210error messages a formal representation, a base graph 410, of the faultscenario. Further it includes elements for analyzing 230 the formalrepresentation to isolate the fault, and elements for reportinginformation containing the fault to the process control, which providesmeasures taking care of the fault.

Hence, the process control is able to maintain, repair, diagnose etc.the detected fault in a much quicker way than known through prior art.

A signature includes information on complaint, complainer andcomplainee, and the signature is formed from an explanation modelcomprising an underlying assumption of communication between systementities needed to carry out a specific task. System entities includeobjects 60, packages and threads.

Regarding the time period it starts with the first error message in thefault scenario and ends with a stop message in one embodiment of thepresent invention.

Further in an embodiment of the invention the system includes elementsfor establishing a reliability estimate 440 of the fault and that thefault information contains the estimate.

If the base graph 410 is inconclusive, the system further includes, inone embodiment of the invention, elements for extending it by providinga system model 220, containing information on dependency relationsbetween system entities, and elements for isolating the fault from theextended graph 420 by using predetermined explanation rules.

In one embodiment of the system according to the present invention it isused in a process involving an industrial robot.

Another embodiment of the present invention provides a computer programcode element includes computer code for enabling a processor to isolatea fault from a plurality of error messages in a system log 200 of aprocess controlled by an object oriented program 110. Thereby enabling aprocessor to carry out the steps of;

forming an interface between the program 110 producing the plurality oferror messages and a process control,

creating a cluster 210 from the plurality of error messages belonging toa fault scenario of the fault,

analyzing the cluster 400 using predefined error message signatures anda predetermined system model 220 to isolate the fault, and

reporting information containing the isolated fault to the processcontrol, which provides measures taking care of the fault. The codeelement is in one embodiment of the present invention contained in acomputer readable medium.

Further, the code element is supplied at least in part over a networksuch as the Internet in an embodiment of the present invention.

In another embodiment of the present invention, a computer readablemedium is provided, containing a computer program code element includingcomputer code for enabling a processor to isolate a fault from aplurality of error messages in a system log 200 of a process controlledby an object oriented program 110. Whereby the processor is enabled tocarry out the steps of:

forming an interface between the program producing the plurality oferror messages and a process control,

creating a cluster 210, 400 from the plurality of error messagesbelonging to a fault scenario of the fault,

analyzing 230 the cluster 400 using predefined error message signaturesand a predetermined system model 220 to isolate the fault, and

reporting information containing the isolated fault to the processcontrol, which provides measures taking care of the fault.

Also, the system according to the present invention is able to conductfeatures as described above in a method.

Elements used in the interface are, for example, software and softwaredrivers, any kind of suitable display etc. and other elements known to aperson skilled in the art.

It is thus believed that the operation and the embodiments of thepresent invention will be apparent, for a person skilled in the art,from the foregoing description. While the method and interface shown ordescribed has been meant as being preferred it will be obvious thatvarious changes and modifications may be made therein without departingfrom the spirit and scope of the present invention as defined in theattached claims.

TABLE I 1.1 red A1 1.2 red BB1 1.3 red F1 1.4 red E1 1.5 red FF1 1.6 redD1 1.7 redSub C1 1.8 redSubsub A2 2.l red A1–A2 2.2 red BB1–BB2 2.3 redBB1–B2 2.4 red E1–E2

1. A method for isolating a fault from a plurality of error messages ina system log of a process controlled by an object oriented program, themethod comprising: creating a plurality of predefined signaturescomprising a predefined signature for each error message of saidplurality of error messages; forming a time period to frame from saidplurality of error messages the error messages belonging to a faultscenario of said fault; clustering said error messages that fall withinsaid time period; forming from said predefined signatures of saidclustered error messages a base graph, which is a formal representationof said fault scenario, carrying cause-effect relation information ofsaid clustered error messages; forming an explanation model from anunderlying assumption of communication between system entities needed tocarry out a specific task, the system entities comprising objects,packages and threads, and having said predefined signatures defined on abase of said explanation model; analyzing said base graph to isolatesaid fault; and reporting information concerning said fault.
 2. Themethod according to claim 1, wherein each of said signatures is broughtto contain information on a complaint and a complainer.
 3. The methodaccording to claim 1, wherein said reported information comprises areliability estimate of said fault.
 4. The method according to claim 1,wherein the information concerning the fault is reported to a controlmeans, which provides measures for resolving the fault.
 5. The methodaccording to claim 1, wherein the process comprises operation of anindustrial robot.
 6. A method for isolating a fault from a plurality oferror messages in a system log of a process controlled by an objectoriented program, the method comprising: creating a plurality ofpredefined signatures comprising a predefined signature for each errormessage of said plurality of error messages; forming a time period toframe from said plurality of error messages the error messages belongingto a fault scenario of said fault, wherein said time period starts withthe first error message in said fault scenario and ends with a stopmessage; clustering said error messages that fall within said timeperiod; forming from said predefined signatures of said clustered errormessages a base graph, which is a formal representation of said faultscenario, carrying cause-effect relation information of said clusterederror messages; analyzing said base graph to isolate said fault; andreporting information concerning said fault.
 7. A method for isolating afault from a plurality of error messages in a system log of a processcontrolled by an object oriented program the method comprising: creatinga plurality of predefined signatures comprising a predefined signaturefor each error message of said plurality of error messages; forming atime period to frame from said plurality of error messages the errormessages belonging to a fault scenario of said fault; clustering saiderror messages that fall within said time period; forming from saidpredefined signatures of said clustered error messages a base graph,which is a formal representation of said fault scenario, carryingcause-effect relation information of said clustered error messages;analyzing said base graph to isolate said fault, wherein if the basegraph is inconclusive said analyzing further comprises extending saidbase graph with the help of a system model including information on thedependency relation between system entities, and analyzing said extendedbase graph with a set of predetermined explanation rules to isolate saidfault; and reporting information concerning said fault.
 8. A system forisolating a fault from a plurality of error messages in a system log ofa process controlled by an object oriented program, the systemcomprising: a plurality of predefined signatures comprising a predefinedsignature for each error message of said plurality of error messages,wherein said predefined signatures comprise information on complaint,complainer and complainee, and wherein said predefined signatures areformed from an explanation model comprising an underlying assumption ofcommunication between system entities needed to carry out a specifictask, said system entities comprise objects, packages and threads; andan interface located between the object oriented program and a means forprocess control, said interface comprising means for clustering errormessages belonging to a fault scenario of said fault, means for formingfrom said signature of said clustered error messages a formalrepresentation of said fault scenario, means for analyzing said formalrepresentation to isolate said fault, and means for reportinginformation concerning said fault.
 9. The system according to claim 8,wherein said means for reporting reports the information concerning thefault to said means for process control, which provides measures forresolving the fault.
 10. The system according to claim 8, wherein saidformal representation comprises a base graph.
 11. The system accordingto claim 8, further comprising: means for establishing a reliabilityestimate of said fault, wherein said fault information contains saidreliability estimate.
 12. The system according to claim 8, wherein theprocess is carried out by an industrial robot.
 13. A system forisolating a fault from a plurality of error messages in a system log ofa process controlled by an object oriented program, the systemcomprising: a plurality of predefined signatures comprising a predefinedsignature for each error message of said plurality of error messages; atime period starts with a first of said plurality of error messages insaid fault scenario and ends with a stop message; and an interfacelocated between the object oriented program and a means for processcontrol, said interface comprising means for clustering error messagesbelonging to a fault scenario of said fault, means for forming from saidsignature of said clustered error messages a formal representation ofsaid fault scenario, means for analyzing said formal representation toisolate said fault, and means for reporting information concerning saidfault.
 14. A system for isolating a fault from a plurality of errormessages in a system log of a process controlled by an object orientedprogram, the system comprising: a plurality of predefined signaturescomprising a predefined signature for each error message of saidplurality of error messages; and an interface located between the objectoriented program and a means for process control, said interfacecomprising means for clustering error messages belonging to a faultscenario of said fault, means for forming from said signature of saidclustered error messages a formal representation of said fault scenario,wherein said formal representation comprises a base graph, means foranalyzing said formal representation to isolate said fault, wherein ifsaid base graph is inconclusive the system further comprises means forextending said base graph with the help of a system model containinginformation on a dependency relation between system entities and meansfor isolating said fault from said extended base graph by usingpredetermined explanation rules, and means for reporting informationconcerning said fault.